What is the role of NLP in text retrieval?
نویسنده
چکیده
This paper addresses the value of linguistically-motivated indexing (LMI) for document and text retrieval. After reviewing the basic concepts involved and the assumptions on which LMI is based, namely that complex index descriptions and terms are necessary, I consider past and recent research on LMI, and specifically on automated LMI via NLP. Experiments in the first phase of research, to the late eighties, did not demonstrate value in LMI, but were very limited; but the much larger tests of the Nineties, with full text, have not done so either. My conclusion is that LMI is not needed for effective retrieval, but has other important roles within information-selection systems. The rapid growth of full text databases, together with developments in natural language processing (NLP) technology, has prompted those engaged with NLP to suggest that it could be usefully applied to text retrieval, primarily for indexing purposes but perhaps also for more or less related tasks such as document ‘abstracting’ or extracting; it could be applied at shallow text as well as at deep content levels, and for user display or for database creation. Retrieval itself has various modes, including filtering or routing as well as one-off searching; the family of information and text processing tasks in which it figures includes categorisation for various purposes; and the material dealt with extends, for example, into hypertext. The claim is that linguistically-motivated analysis, and hence NLP, is needed not only for tasks like information extraction but even, in the current demanding circumstances where vast volumes of machine-readable material are becoming available, for such simple ones as document retrieval. Further, NLP may also be needed to provide useful linkage between data and task types within the family of information-selection tasks, by supplying concept representations that may be exploited for related purposes (as with data and document query), as well as for more friendly information supply for the user (as with a display of key document concepts). In what follows I shall concentrate on document (text) retrieval, returning briefly to other tasks later. Thus the questions to be examined are: 1. What indexing and searching devices depend on linguistically-motivated analysis? 2. How do these devices depend on such analysis? 3. Does automation (i.e. NLP) affect the type of device used, or way it is created and manipulated?
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملRole of Natural Language Processing in Information Retrieval; Challenges and Opportunities
This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy t...
متن کاملImage retrieval using the combination of text-based and content-based algorithms
Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کامل